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Abstract 

We present a solution to one of the fundamental problems 
in computer graphics, the hidden surface removal. In most 
3D-graphics systems the hidden surface removal is done 
using the Z-Buffer algorithm. This method, however, 
requires to perform a read-modify -write memory access 
for each visible pixel, what represents a severe perfor- 
mance limit. We introduce a novel SRAM cell, which 
incorporates the needed logical units to perform the Z- 
Buffer algorithm on its own. Placed into the page register 
of conventional DRAMs, almost any pixel rate can be 
achieved. 

1 Introduction 

A common representation of 3-dimensional objects in 
computer graphics applications is an approximation of 
their surface by a sufficiently large number of polygons, 
mostly triangles. Graphical output devices nowadays are 
almost exclusively raster devices, that is, the display area 
is comprised of a set of discrete pixels. Each pixel on the 
screen is associated with one entry in the display buffer (or 
Frame Buffer), which holds the color of the pixel. For dis- 
play purposes, each triangle must therefore be decom- 
posed into the set of screen pixels it covers, a process 
called rasterization. Current state-of-the-art rasterizer 
units are capable of producing in excess of 100M pixels/s, 
if fed with triangles at an appropriate rate. 

The non-trivial task of determining the visible parts of 
an object can be solved by computing for each pixel a 
measure for the distance to the observer, called Z-Value, 
and by comparing that Z- Value to the Z-Value of the pixel 
previously generated for this screen address. This is the 
so-called Z-Buffer Algorithm [1],[2], which requires addi- 
tional memory capacity to hold the Z-Values. 

Thus, before a pixel at screen address (x,y) can be writ- 
ten, the value in the Z-Buffer at address (x.y) must be read 
and compared to the newly generated one. If the new pixel 
is nearer to the observer, its Z-value must be written into 
the Z-Buffer, and its color is stored in the Frame Buffer to 
be displayed on the screen. The new pixel is discarded if it 
was behind the old one. 



Obviously, the algorithm bears an inherent memory 
bandwidth problem. A typical screen resolution of about 
1M pixels and 16 to 32 bits per Z- Value prohibit the use of 
fast SRAM devices due to the high costs. Slow DRAM 
devices force the system designer to build highly-inter- 
leaved and expensive memory systems to keep up with the 
fast rasterizer units. 

As a solution, we propose to integrate the compare 
logic into the Z-Buffer memory devices, and to perform 
the complete Z-Buffer algorithm on chip. In this way, the 
enormous internal bandwidth is available, and the read- 
modify-write-cycles are turned into merely write-cycles 
from an external point of view. In its basic configuration, 
the Z-Buffer only outputs the farther/nearer-flag as a con- 
trol signal for the Frame Buffer. 

However, the compare logic must be fast, compact and 
compatible to the DRAM technology to be easily inte- 
grated. In this paper we will describe a logic embedded 
SRAM cell, called CBit cell, capable of performing the Z- 
compare operation, and a memory architecture, called 
ZRAM, which incorporates this cell for extremely high 
pixel rates. 

2 CBit cells 

For explanation purposes, we assume a Z-resolution of 
32 bits. Let's consider the logic embedded SRAM cell in 
Figure 1 , which holds the MSB ZO31 of the old Z-value. It 
must be compared to the newly generated MSB ZN31, 
which is put on the true and inverted bit-lines. The upper 
half of the schematic consists of a common 6-transistor 
CMOS static RAM cell. The remaining seven transistors 
perform the compare operation. They are mainly N-type, 
and therefore, the cell is called an N-type CBit cell (there 
is a corresponding P-type cell, which will be introduced 
later). The operation is as follows: 

□ Prior to any access, the write-signal WR and the select- 
signal S31 are held low, the nearer flag NN is pre- 
charged high. Thus, S30 = 1 . 

□ An access starts by placing the incoming ZN 31 -bit and 
its inverted value on the corresponding bit lines. Then, 
S 31 is activated and NN is left floating. 
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Figure 1:N-type CBit cell 

This will produce logical values on the output lines as 
given in Table 1. 
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Table 1: Functional behavior of N-type CBIt cells 

Thus, the NN-signal goes active (low) if ZN 31 < Z0 3I , 
indicating at the same time that the new Z- Value ZN[ 31 0 ] 
is smaller than the stored Z- Value ZO [31 0] . In the case 
ZN 3I > Z0 3 i the decision is made in the MSB as well. 
Only in the case ZN 3! = Z0 31 the next lower bit must be 
tested. Thus, the S 30 -line can be used to activate the CBit 
cell holding ZO 30 . 

However, we have to consider that the active level of 
the select*signal has changed from the input to the output 
of the cell. This seems to be very impractical, and one 
could think about various arrangements of P-type and re- 
type transistors at the places T7 - Til. However, regard- 



less what type the pass transistors are, they are good at 
passing a voltage level which does not switch on other 
pass transistors of the same type. An inverter would solve 
the problem, but it would also increase the propagation 
delay and the transistor count for each cell. Therefore, we 
construct a P-type CBit cell as showji in Figure 2, which is 
activated directly by the active low S 30 -signal. 




Figure 2: P-type CBit cell 

Consequently, we have to introduce the line NP, indi- 
cating the nearer-condition for P-type CBit cells. It has an 
active high level and is pulled low prior to any access. NP 
and NN are not connected to each other, and only one of 
them can be activated during a given access. 

If selected, the internal state of the P-type CBit cell is 
shown in Table 2. 
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Table 2: Functional behavior of P-type CBit cells 

This time, the next lower CBit cell is activated by pass- 
ing a high level on the S29*line in the case ZN 30 = Z0 3u . 
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4 Performance 



A sample implementation of the circuitry was done at 
the IBM Development Laboratory at Boblingen using 
IBM's CMOS5L technology (3.3V, 0.5nm effective chan- 
nel length}. The following simulation results were 
obtained: the select-signal ripples through N-type and P- 
type cells in about 0.1ns and 0.15ns, respectively. After 
being selected, it takes 0.24ns for an N-type cell to activate 
the NN-line, and 0.56ns for a P-ty pe cell to pull up the NP- 
line. Figure 5 shows the timing diagram of one N- and P- 
type CBit cell combination. The markers indicate delays 
as explained below. 

Interval Condition Delay 



V2 - V1 , V4 - V3 = Z031.^) 0.25ns 

V6 - V5 Nearer-Condition (N-type) 0.24ns 

V8 - V7 Nearer-Condition (P-type) 0.66ns 



In this implementation, a worst-case 32-bit compare 
operation takes about 4.4ns. However, the compare time 
can be brought well into the sub-nanosecond range with- 
out increasing the hardware expenses significantly. This is 
accomplished by breaking the select-chain into a number 
of shorter sub-chains, activating the select-signals of the 
CBit cells 'holding the MSB of each sub-chain simulta- 
neously and combining the results. An example is shown 
in Figure 6, where the 32-bit chain is divided into 4 sub- 
chains, thereby reducing the overall delay time to approxi- 
mately one fourth. The additional hardware expenses are 
as low as 12 transistors. 
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Figure 5: Timing diagram of one N- and P-type CBit cell combination 
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5 Architecture of the ZRAM-Chip 

For the Z-Buffer, we propose to modify standard 
DRAM devices such that the page register is constructed 
from CBit cells. In the simplest configuration, the ZRAM 
chip has ai32-bit data interface for the Z- Values, and there- 
fore accepts pixel addresses. The page register is orga- 
nized such that 32 adjacent CBit cells hold the Z- Value of 
one pixel. Supposed the DRAM block is organized as 
lKxlK memory cells, the appropriate page register stores 
the Z- Values of 32 pixels. Most rasterizers operate screen 
line oriented, so that assigning screen line fragments to 
DRAM pages will give a high percentage of page hits. 
Nevertheless, any page fault will result in a severe perfor- 
mance loss due to the short processing time of one Z-com- 
pare operation. Thus we propose to install multiple page 
registers, and to use a page prefetch mechanism. The oper- 
ation is as follows: upon receipt, Z- Value and pixel 
address are stored in FIFO memories placed onto the 
ZRAM as well. While performing the appropriate page 
access, new data is allowed to enter the chip, and succeed- 
ing page faults are detected. As soon as operation starts 
with the just loaded page register, a new page access is ini- 
tiated, and so forth. A block diagram of the proposed 
architecture is shown in Figure 7. For simplicity, we 
assume the DRAM device being built as one single mem- 
ory block. Due to the pipelined operation, the nearer-flags 
are handec) out to the Frame Buffer Controller with a cer- 
tain latency. 



6 Conclusion 

We presented a memory design which represents the 
ultimate solution to the notorious Z-Buffer Bottleneck. No 
new technology is required, and additional hardware 
expenses are small. Not described in this paper (since 
beyond the technical focus of the Conference) are further 
possibilities of this design, such as supporting Anti-Alias- 
ing on sub-pixel resolution and further increasing the pixel 
rate by placing parts of the rasterizer unit onto the ZRAM 
as welt. 
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